Africa Government
Mafoko: Structuring and Building Open Multilingual Terminologies for South African NLP
Marivate, Vukosi, Dzingirai, Isheanesu, Banda, Fiskani, Lastrucci, Richard, Sindane, Thapelo, Madumo, Keabetswe, Olaleye, Kayode, Modupe, Abiodun, Netshifhefhe, Unarine, Combrink, Herkulaas, Nakeng, Mohlatlego, Ledwaba, Matome
The critical lack of structured terminological data for South Africa's official languages hampers progress in multilingual NLP, despite the existence of numerous government and academic terminology lists. These valuable assets remain fragmented and locked in non-machine-readable formats, rendering them unusable for computational research and development. Mafoko addresses this challenge by systematically aggregating, cleaning, and standardising these scattered resources into open, interoperable datasets. We introduce the foundational Mafoko dataset, released under the equitable, Africa-centered NOODL framework. To demonstrate its immediate utility, we integrate the terminology into a Retrieval-Augmented Generation (RAG) pipeline. Experiments show substantial improvements in the accuracy and domain-specific consistency of English-to-Tshivenda machine translation for large language models. Mafoko provides a scalable foundation for developing robust and equitable NLP technologies, ensuring South Africa's rich linguistic diversity is represented in the digital age.
- Africa > South Africa > Gauteng > Pretoria (0.05)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Education (0.95)
- Government > Regional Government > Africa Government > South Africa Government (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
I built this 'AI aunt' for women after family tragedy in South Africa
I built this'AI aunt' for women after family tragedy in South Africa A gruesome killing in her own family inspired South African Leonora Tima to create a digital platform where people, mostly women, can talk about and track abuse. Leonora's relative was just 19 years old, and nine months pregnant, when she was killed, her body dumped on the side of a highway near Cape Town in 2020. I work in the development sector, so I've seen violence, Leonora says. But what stood out for me was that my family member's violent death was seen as so normal in South African society. Her death wasn't published by any news outlet because the sheer volume of these cases in our country is such that it doesn't qualify as news.
- Africa > Tanzania (0.29)
- Africa > South Africa > Western Cape > Cape Town (0.25)
- South America (0.14)
- (18 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.72)
- Law > Civil Rights & Constitutional Law (0.70)
- Government > Regional Government > Africa Government (0.47)
ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement
Luo, Kangyang, Bai, Yuzhuo, Si, Shuzheng, Gao, Cheng, Wang, Zhitong, Shen, Yingli, Li, Wenhao, Liu, Zhu, Han, Yufeng, Wu, Jiayi, Kong, Cunliang, Sun, Maosong
Coreference Resolution (CR) is a critical task in Natural Language Processing (NLP). Current research faces a key dilemma: whether to further explore the potential of supervised neural methods based on small language models, whose detect-then-cluster pipeline still delivers top performance, or embrace the powerful capabilities of Large Language Models (LLMs). However, effectively combining their strengths remains underexplored. To this end, we propose \textbf{ImCoref-CeS}, a novel framework that integrates an enhanced supervised model with LLM-based reasoning. First, we present an improved CR method (\textbf{ImCoref}) to push the performance boundaries of the supervised neural method by introducing a lightweight bridging module to enhance long-text encoding capability, devising a biaffine scorer to comprehensively capture positional information, and invoking a hybrid mention regularization to improve training efficiency. Importantly, we employ an LLM acting as a multi-role Checker-Splitter agent to validate candidate mentions (filtering out invalid ones) and coreference results (splitting erroneous clusters) predicted by ImCoref. Extensive experiments demonstrate the effectiveness of ImCoref-CeS, which achieves superior performance compared to existing state-of-the-art (SOTA) methods.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- (15 more...)
- Law (1.00)
- Government > Regional Government > Africa Government (0.68)
- Banking & Finance (0.68)
- (2 more...)
UN warns of potential 'ethnically driven' atrocities in Sudan's el-Fasher
UN warns of potential'ethnically driven' atrocities in Sudan's el-Fasher At least 91 people have been killed in Sudan's besieged city of el-Fasher in attacks by the paramilitary Rapid Support Forces (RSF) over 10 days last month, the United Nations says. The attacks took place during intensified fighting between the RSF and Sudan's army around the city, the largest urban centre in the Darfur region that remains under the control of the military and its allies, known as the Joint Forces. UN rights chief Volker Turk said on Thursday that the city's Daraja Oula neighbourhood was repeatedly attacked and subjected to RSF artillery shelling, drone strikes and ground incursions from September 19 to 29. He called for urgent action to prevent "large-scale, ethnically driven attacks and atrocities in el-Fasher." He said "atrocities are not inevitable", adding that "they can be averted if all actors take concrete action to uphold international law, demand respect for civilian life and property, and prevent the continued commission of atrocity crimes".
- Africa > Sudan > North Darfur State > El Fasher (1.00)
- North America > United States (0.16)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.06)
- (8 more...)
- Government > Military (1.00)
- Government > Regional Government > Africa Government > Sudan Government (0.36)
South African-born Musk evoked by Trump during meeting with nation's leader: 'Don't want to get Elon involved'
President Donald Trump evoked Elon Musk during his Oval Office meeting with South Africa's president on Wednesday, during talks about the ongoing attacks white farmers in the country are facing. Trump went back and forth with President Cyril Ramaphosa over whether what is occurring in South Africa is indeed a "genocide" against white farmers. At one point, during the conversation, a reporter asked Trump how the United States and South Africa might be able to improve their relations. The president said that relations with South Africa are an important matter to him, noting he has several personal friends who are from there, including professional golfers Ernie Els and Retief Goosen, who were present at Tuesday's meeting, and Elon Musk. President Donald Trump and Elon Musk attend a UFC 309 at Madison Square Garden last November. Unprompted, Trump added that while Musk may be a South African native, he doesn't want to "get [him] involved" in the ongoing foreign diplomacy matters that played out during Tuesday's meeting.
- Africa > South Africa (1.00)
- North America > United States (0.99)
- Government > Regional Government > North America Government > United States Government (0.62)
- Government > Regional Government > Africa Government > South Africa Government (0.56)
Low-Resource Neural Machine Translation Using Recurrent Neural Networks and Transfer Learning: A Case Study on English-to-Igbo
Ekle, Ocheme Anthony, Das, Biswarup
In this study, we develop Neural Machine Translation (NMT) and Transformer-based transfer learning models for English-to-Igbo translation - a low-resource African language spoken by over 40 million people across Nigeria and West Africa. Our models are trained on a curated and benchmarked dataset compiled from Bible corpora, local news, Wikipedia articles, and Common Crawl, all verified by native language experts. We leverage Recurrent Neural Network (RNN) architectures, including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), enhanced with attention mechanisms to improve translation accuracy. To further enhance performance, we apply transfer learning using MarianNMT pre-trained models within the SimpleTransformers framework. Our RNN-based system achieves competitive results, closely matching existing English-Igbo benchmarks. With transfer learning, we observe a performance gain of +4.83 BLEU points, reaching an estimated translation accuracy of 70%. These findings highlight the effectiveness of combining RNNs with transfer learning to address the performance gap in low-resource language translation tasks.
- Africa > Sudan (0.28)
- Africa > West Africa (0.24)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Law (0.67)
- Government > Regional Government > Africa Government (0.46)
How drones killed nearly 1,000 civilians in Africa in three years
The use of drones by several African countries in their fight against armed groups is causing significant harm to civilians, according to a new report. More than 943 civilians have been killed in at least 50 incidents across six African countries from November 2021 to November 2024, according to the report by Drone Wars UK. The report, titled Death on Delivery, reveals that strikes regularly fail to distinguish between civilians and combatants in their operations. Experts told Al Jazeera that the death toll is likely only the tip of the iceberg because many countries run secretive drone campaigns. As drones rapidly become the weapon of choice for governments across the continent, what are the consequences for civilians in conflict zones?
- North America > United States (0.15)
- Asia > Middle East > Republic of Türkiye (0.06)
- Asia > China (0.06)
- (6 more...)
- Government > Military (1.00)
- Government > Regional Government > Africa Government (0.30)
The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
Rajab, Jenalea, Aremu, Anuoluwapo, Chimoto, Everlyn Asiko, Dunbar, Dale, Morrissey, Graham, Thior, Fadel, Potgieter, Luandrie, Ojo, Jessico, Tonja, Atnafu Lambebo, Chetty, Maushami, Nekoto, Onyothi, Moiloa, Pelonomi, Abbott, Jade, Marivate, Vukosi, Rosman, Benjamin
This paper presents the Esethu Framework, a sustainable data curation framework specifically designed to empower local communities and ensure equitable benefit-sharing from their linguistic resources. This framework is supported by the Esethu license, a novel community-centric data license. As a proof of concept, we introduce the Vuk'uzenzele isiXhosa Speech Dataset (ViXSD), an open-source corpus developed under the Esethu Framework and License. The dataset, containing read speech from native isiXhosa speakers enriched with demographic and linguistic metadata, demonstrates how community-driven licensing and curation principles can bridge resource gaps in automatic speech recognition (ASR) for African languages while safeguarding the interests of data creators. We describe the framework guiding dataset development, outline the Esethu license provisions, present the methodology for ViXSD, and present ASR experiments validating ViXSD's usability in building and refining voice-driven applications for isiXhosa.
- North America > United States (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (5 more...)
- Information Technology (0.46)
- Government > Regional Government > Africa Government (0.46)
Kenya's President Wades Into Meta Lawsuits
Can a Big Tech company be sued in Kenya for alleged abuses at an outsourcing company working on its behalf? That's the question at the heart of two lawsuits that are attempting to set a new precedent in Kenya, which is the prime destination for tech companies looking to farm out digital work to the African continent. The two-year legal battle stems from allegations of human rights violations at an outsourced Meta content moderation facility in Nairobi, where employees hired by a contractor were paid as little as 1.50 per hour to view traumatic content, such as videos of rapes, murders, and war crimes. The suits claim that despite the workers being contracted by an outsourcing company, called Sama, Meta essentially supervised and set the terms for the work, and designed and managed the software required for the task. Both companies deny wrongdoing and Meta has challenged the Kenyan courts' jurisdiction to hear the cases.
- Africa > Kenya > Nairobi City County > Nairobi (0.27)
- Asia > Philippines (0.05)
- Asia > India (0.05)
- Africa > Uganda (0.05)
- Law > Litigation (1.00)
- Government > Regional Government > Africa Government > Kenya Government (1.00)
BOTS-LM: Training Large Language Models for Setswana
Brown, Nathan, Marivate, Vukosi
In this work we present BOTS-LM, a series of bilingual language models proficient in both Setswana and English. Leveraging recent advancements in data availability and efficient fine-tuning, BOTS-LM achieves performance similar to models significantly larger than itself while maintaining computational efficiency. Our initial release features an 8 billion parameter generative large language model, with upcoming 0.5 billion and 1 billion parameter large language models and a 278 million parameter encoder-only model soon to be released. We find the 8 billion parameter model significantly outperforms Llama-3-70B and Aya 23 on English-Setswana translation tasks, approaching the performance of dedicated machine translation models, while approaching 70B parameter performance on Setswana reasoning as measured by a machine translated subset of the MMLU benchmark. To accompany the BOTS-LM series of language models, we release the largest Setswana web dataset, SetsText, totalling over 267 million tokens. In addition, we release the largest machine translated Setswana dataset, the first and largest synthetic Setswana dataset, training and evaluation code, training logs, and MMLU-tsn, a machine translated subset of MMLU.
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)